A syntax-based part-of-speech analyser
نویسنده
چکیده
There are two main methodologies for constructing the knowledge base of a natural language analyser: the linguistic and the data-driven. Recent state-ofthe-art part-of-speech taggers are based on the data-driven approach. Because of the known feasibility of the linguistic rule-based approach at related levels of description, the success of the datadriven approach in part-of-speech analysis may appear surprising. In this paper, a case is made for the syntactic nature of part-of-speech tagging. A new tagger of English that uses only linguistic distributional rules is outlined and empirically evaluated. Tested against a benchmark corpus of 38,000 words of previously unseen text, this syntax-based system reaches an accuracy of above 99%. Compared to the 95-97% accuracy of its best competitors, this result suggests the feasibility of the linguistic approach also in part-of-speech analysis.
منابع مشابه
Speech Recognition System For Spoken Japanese Sentences
A speech recognition system for continuously spoken Japanese simple sentences is described. The acoustic analyser based on a psychological assumption for phoneme identification can represent the speech sound by a phoneme string in an expanded sense which contains acoustic features such as buzz and silence as well as ordinary phonemes. Each item of the word dictionary is written in Roman letters...
متن کاملDesign and Implementation of an Intelligent Part of Speech Generator
The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...
متن کاملFragmentation And Part Of Speech Disambiguation
That at least some syntax is necessary to support semantic processing is fairly obvious. To know exactly how much syntax is needed, however, and how and when to apply it, is still an open and crucial, albeit old, question. This paper discusses the solutions used in a semantic analyser of French called SABA, developed at the University of Liege, Belgium. Specifically, we shall argue in favor of ...
متن کاملFine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text
Morphological analyzers and part-of-speech taggers are key technologies for most text analysis applications. Our aim is to develop a part-of-speech tagger for annotating a wide range of Arabic text formats, domains and genres including both vowelized and non-vowelized text. Enriching the text with linguistic analysis will maximize the potential for corpus re-use in a wide range of applications....
متن کاملA Chinese Efficient Analyser Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing
This paper introduces an efficient analyser for the Chinese language, which efficiently and effectively integrates word segmentation, part-of-speech tagging, partial parsing and full parsing. The Chinese efficient analyser is based on a Hidden Markov Model (HMM) and an HMM-based tagger. That is, all the components are based on the same HMM-based tagging engine. One advantage of using the same s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995